Similarities between Arabic dialects: Investigating geographical proximity

نویسندگان

چکیده

The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines based on increasingly limited geographic areas like cities and provinces. This paper focuses a related yet relatively unexplored topic: the effects geographical proximity located Arab countries their dialectical similarity. Our twofold, reliant on: 1) comparing textual similarities between using cosine similarity 2) measuring distance locations. We study MADAR NADI, two established datasets with from many results indicate different may fact have more than within same country, depending proximity. correlation city suggests are closer together likely to share attributes, regardless country borders. nuance provides potential for important advancements dialect because it indicates granular approach essential understanding how frame problem identification.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing Arabic Dialects

The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem ...

متن کامل

Variation in polar interrogative contours within and between Arabic dialects

Quantitative analysis of fundamental frequency (F0) contours in yes/no-questions and coordinated questions, are compared across eight Arabic dialects, based on scripted role play data from the Intonational Variation in Arabic corpus [1]. Visualisation of the F0 contour of all tokens is used to evaluate how consistently speakers produce a typical contour in each dialect, for each question type. ...

متن کامل

Machine Translation of Arabic Dialects

Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourcing to cheaply and quickly build LevantineEnglish and Egyptian-English parallel corpora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon’s Mechanical Tur...

متن کامل

Automatic Identification of Arabic Dialects

In this work, automatic recognition of Arabic dialects is proposed. An acoustic survey of the proportion of vocalic intervals and the standard deviation of consonantal intervals in nine dialects (Tunisia, Morocco, Algeria, Egypt, Syria, Lebanon, Yemen, Golf’s Countries and Iraq) is performed using the platform Alize and Gaussian Mixture Models (GMM). The results show the complexity of the autom...

متن کامل

Morphological Analysis and Generation for Arabic Dialects

We present MAGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. MAGEAD provides an analysis to a root+pattern representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Processing and Management

سال: 2022

ISSN: ['0306-4573', '1873-5371']

DOI: https://doi.org/10.1016/j.ipm.2021.102770